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Abstract: It is well-known that programs may fail due to exceptional behaviors, out-of-bound 
array accesses, or simply coding errors. Thus, they cannot be blindly trusted. Scientific computing 
programs make no exception in that respect, and even bring specific accuracy issues due to their 
massive use of floating-point computations. Yet, it is uncommon to guarantee their correctness. 
There exist methods and tools for proving the correct behavior of programs and we have extended 
them for the verification of an existing numerical analysis program. This C program implements the 
second-order centered finite difference explicit scheme for solving the ID wave equation. In fact, we 
have gone much further as we have mechanically verified the convergence of the numerical scheme 
in order to get a complete formal proof covering all aspects from partial differential equations to 
actual numerical results. To the best of our knowledge, this is the first time such a comprehensive 
proof is achieved. 

Key-words: Acoustic wave equation, Formal proof of numerical program, Convergence of nu- 
merical scheme, Rounding error analysis 
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Calculs en confiance : une preuve mecanisee de l'equation 
aux derivees partielles au programme effect if 

Resume : II est bien connu que les programmes peuvent echouer, que ce soit en raison de com- 
portements exceptionnels, d'acces hors des bornes d'un tableau, ou simplement d'erreurs de codage. 
On ne peut done pas leur faire confiance aveuglement. De ce point de vue, les programmes de cal- 
cul scientifiquc ne font pas exception, et ils presentent meme des problemes specifiques de precision 
en raison de l'utilisation massive de calculs en virgulc flottante. Cependant, leur correction est 
rarement garantie. Ils existe des methodes et des outils pour prouver le comportement correct 
de programmes et nous les avons etendus a la verification d'un programme d'analyse numerique 
cxistant. Ce programme C implemente le schema explicite aux differences finies centrees du second 
ordre pour la resolution de l'equation des ondes mono-dimensionnelle. En fait, nous sommes alles 
beaucoup plus loin puisque nous avons verific mccaniquement la convergence du schema numerique 
pour obtenir une preuve formelle complete couvrant tous les aspects de l'equation aux derivees 
partielles aux resultats numeriques effect ifs. A notre connaissance, e'est la premiere fois qu'une 
telle preuve complete est obtenuc. 

Mots-cles : equation des ondes acoustiques, preuve formelle d'un programme numerique, con- 
vergence d'un schema numerique, analyse d'erreur d'arrondi. 
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1 Introduction 

Given an appropriate set of mathematical equations (such as ODEs or PDEs) modeling a physical 
object, the usual simulation process is as follows. First, these equations are approximated by a set 
of discrete equations, the numerical scheme; this scheme is then proved to be convergent. Second, 
the numerical scheme is implemented as a computer program. The program is eventually run to 
perform simulations. 

The modeling of critical systems requires correctness of the modeling programs in the sense 
that there is no runtime error and the computed value is an accurate solution to the mathematical 
equations. The correctness of the program relies on the correctness of the two phases. 

Usually, the discretization phase is guaranteed by a pen-and-paper proof of the convergence 
of the selected scheme, while the implementation phase is simply validated by a set of tests. The 
drawback of pen-and-paper proofs is that human beings are fallible and errors may be left, for 
example in long and tedious proofs involving a large number of subcases. The drawback of test- 
based validation is that, except for exhaustive testing which is impossible here, it does not imply 
a proof of correctness in all possible cases. Therefore, one may overestimate the convergence rate, 
or miss coding errors, or underestimate round-off errors due to floating-point computations. In 
short, this methodology is not a way to obtain the correctness of modeling programs. 

The fallibility of pen-and-paper proofs and the limitations of validation by tests is not a new 
problem, and has been a fundamental concern for a long time in the computer science community. 
The answer to this question came from mathematical logic with the notion of logical framework and 
formal proof. A logical framework provides tools to describe mathematical objects and results, and 
state theorems to be proved. Then, the proof of those theorems has all its logical steps validated 
in the logical framework by a computer running a mechanical proof checker. This kind of proof 
forbids logical errors and prevents omissions: it is a formal proof. Therefore, a formal proof can 
be considered as a perfect pen-and-paper proof with neither hole nor oversight left inside. 

Fortunately, logical frameworks also support the definition of computer programs and the 
specification of their properties. Then, a correctness proof of a program is a formal proof that no 
execution of the program will go wrong and that the properties are always verified. A formal proof 
of a program can be considered as a comprehensive validation with an exhaustive set of tests. 

Mechanical proof checkers are mainly used to formalize mathematics and are routinely used 
to prove programs in the field of integer arithmetics and symbolic computation. We apply the 
same methodology to numerical programs in order to obtain the same safety level in the scientific 
computing field. The simulation process is revisited as follows. The discretization phase requires 
some preliminary work in the logical framework; we must implement the necessary mathematical 
concepts and results to describe continuous and discrete equations (in particular, the notion of 
convergent numerical scheme). Given this mathematical setting, we can write a faithful formal 
proof of the convergence of the discrete solution towards the continuous solution. Then, we can 
specify the modeling program and the properties of the computed values, and obtain a formal 
proof of its correctness. If we specify that computed values are close enough to the real solution 
of the numerical scheme, then the correctness proof of the program ensures the correctness of the 
whole simulation process. 

This revised simulation process seems easy enough. However, the difficulty of the necessary 
formal proofs must not be underestimated, notably because scientific computing adds specific dif- 
ficulties to specifications and proofs. The discretization phase uses real numbers and real analysis 
theory. Usual theorems and tools of real analysis are still in their infancy in mechanical proof 
checkers. In addition, numerical programs use floating-point arithmetic, and more specifically, the 
IEEE- 754 standard. Properties of floating-point arithmetic are complex and highly nonintuitive, 
which is yet another show stopper for formal proofs of numerical programs. 

To summarize, the field of scientific computing gathers regular formal proof difficulties for 
mathematics and programs, and the specific difficulties of real analysis and its relationships to 
floating-point arithmetic. This complexity explains why mechanical proof checkers are mostly 
unknown in scientific computing. Recent progress in mechanical proof checkers providing some 
real analysis and an IEEE-754 formalization now makes formal proofs of numerical programs 
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tractable. 

In this article, we conduct the formal proof of a very simple C program implementing the 
second-order centered finite difference explicit scheme for solving the one-dimensional acoustic 
wave equation. This is a first step towards the formal proof of more complex programs used 
in critical situations. This article complements a previous publication about the same experi- 
ment |12j . This time however, we do not focus on the advances of some formal proof techniques, 
but we rather present an overview of how formal methods can be useful for scientific computing 
and which cost they come at. 

Formal proof systems are relatively recent compared with mathematics or computer science. 
The system considered as the first proof assistant is Automath. It has been designed by de Bruijn 
in 1967 and has been very influential for the evolution of proof systems. As a matter of comparison, 
FORTRAN language was born in 1954. Almost all modern proof assistants then appeared in the 
1980s. In particular, the first version of Coq was created in 1984 by Coquand and Huet. The 
ability to reason about numerical programs came much later, as we have seen that knowledge 
of arithmetic and analysis is required. In Coq, real numbers have been formalized in 1999 and 
floating-point numbers in 2001. One can note that some of these developments were born from 
interactions between several domains, and so is this work. 

The formal proofs will be too long to be given here in extenso. Only general ideas and difficulties 
will be explained. The annotated C program and the full Coq sources of the formal development 
are available from 

http : / /f ost . saclay . inria.fr/wave_total_error . html 

The paper is organized as follows. The notion of formal proof and the main formal tools are 
presented in Section[2] Section[3]describes the PDE, the numerical scheme, and their mathematical 
properties. Sections [4] is devoted to the formal proof of the convergence of the numerical scheme, 
and Section [5] to the formal proof of the C program implementing the numerical scheme. Finally, 
Section [6] paints a broader picture of the study. 

A glossary of terms from the mathematical logic and computer science fields is given in ap- 
pendix. The main occurrences of such terms* are emphasized in the text using an italic font, and 
a star superscript. 

2 Formal Proof 

Modern mathematics can be seen as the science of abstract objects, e.g. real numbers, differential 
equations. In contrast, mathematical logic works on the languages used to define such abstract 
objects and reason about them. Once these languages of definitions and proofs are formalized, 
one can manipulate and reason about mathematical proofs: What makes a valid proof? How 
can we find one? And so on. This paves the way to two topics we are interested in: mechanical 
verification* of proofs, and automated deduction of theorems. In both cases, the use of computer- 
based tools will be paramount to the success of the approach. 

2.1 What is a Formal Proof? 

When it comes to abstract objects, believing that some properties are true requires some methods 
of judgment. Unfortunately, some of these methods might not be infallible: they might be incorrect 
in general, or their execution might be lacking in a particular setting. Logical reasoning aims at 
eliminating any unjustified assumption and ensuring that only infallible inferences are used, thus 
leading to properties that are believed to be true with the greatest confidence. 

The reasoning steps that are applied to deduce from a property believed to be true a new 
property believed to be true is called an inference rule* . They are usually handled at a syntactic 
level: only the form of the statements matter, their content does not. For instance, the modus 
ponens rule states that, if both properties "A" and "if A then B" hold, then property "B" holds 
too, whatever the meaning of A and B. Conversely, if one deduces "B" from "A" and "if C then 
then something is amiss: while the result might hold, its proof is definitely incorrect. 
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This is where formal proofs* show up. Indeed, since inference rules are simple manipulations 
of symbols, applying them or checking that they have been properly applied do not require much 
intelligence. (The intelligence lies in choosing which one to apply.) Therefore, this work can be 
delegated to a computer running a formal system. The computer will perform it much faster and 
much more systematically than a human ever could. Assuming that such formal systems have 
been designed with careQthe results they produce are true with the greatest confidence. 

The downside of formal proofs is that they are really low-level; they are down to the most 
elementary steps of a reasoning. It is no longer possible to skip some proof steps, dismissing them 
by a wave of hand, trusting the reader to be intelligent enough to fill the blanks. Fortunately, since 
inference rules are mechanical by nature, a formal system can also try to apply them automatically 
without any user interaction. Thus it will produce new results, or at least proofs of known results. 
At worst, one could imagine that a formal system applies inference rules blindly in sequence until a 
complete proof of a given result is found. In practice, clever algorithms have been designed to find 
the proper inference steps for domain-specific properties. This considerably eases the process of 
writing formal proofs. Note that numeric analysis is not amenable to automatic proving yet, which 
means that related properties will require a lot of human interaction, as shown in Section |6.1| 

It should have become apparent by now that formal systems are primarily aimed at proving 
and checking mathematical theorems. But programs are also among the abstract objects they 
can manipulate, thus allowing to prove theorems about them. These theorems might be about 
basic properties of a program, e.g. it will not crash. They might also be about higher-level 
properties, e.g. the computed results have such and such properties. For instance, in this paper, 
we are interested in proving that the values computed by the program are actually close to the 
continuous solution to the partial differential equation. Note that these verifications are said to 
be static: they are done once and for all, yet they cover all the future executions of a program. 

Formal verification of a program comes with a disclaimer though, since a program is not just an 
abstract object, it also has a concrete behavior once executed. Even if one has formally proved that 
a program always returns the expected value, mishaps might still happen. First and foremost, the 
specification* of what the program is expected to compute might be wrong or just incomplete. For 
instance, a random generator could be defined as being a function that takes no input and returns 
a value between and 1. One could then formally verify that a given function satisfies such a 
specification. Yet that does not tell anything about the actual randomness of the computed value: 
the function might well always return the same number while still satisfying the specification. 
This means that formal proofs do not completely remove the need for testing, as one still needs 
to make sure specifications are meaningful; but they considerably reduce the need for exhaustive 
testing. 

Another consideration regarding the range of the trust in formally verified programs stems from 
the fact that programs do not run in isolation, so formal methods have to make some assumptions. 
Basically, they assume that the program executed in the end is the one that was actually verified 
and not some variation of it. This seems an obvious assumption, but practice has shown that a 
program might be miscompiled, that some malware might be poking memory at random, or that 
a computer processor might have design flaws. So the trust in what a program actually computes 
will still be conditioned to the trust in the environment it is executed in. Note that this issue 
is not specific to verified programs, so they still have the upper hand over unverified programs. 
Moreover, people are also applying formal methods to improve the overall trust in a system: they 
are verifying hardware design on a daily basis, and are starting to formally verify compilers [3 2) 
and operating systems |30j too. 

2.2 Formal Proof Tools at Work 

There is not a single tool that would allow us to tackle the formal verification* of the C program 
we are interested in. We will use different tools depending on what kind of abstract objects we 

1 The core of a formal system is usually a very small program, much smaller than any proof it will later have to 
manipulate, and thus easy to check and trust. For instance, while expressive enough to tackle any proof of modern 
mathematics, the kernel of HOL Light is just 200-line long. 
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want to manipulate or prove properties about. 

The first step lies in running the tool Frama-C (with the Jessie plug-in) over the program. We 
have slightly modified the C program by adding comments stating what the program is expected 
to compute. These annotations* are just mathematical properties of the program variables, e.g. 
the result variables are close approximations to the values of the continuous solution. Except for 
these comments, the code was not modified. Frama-C takes the program and the annotations and 
it generates a set of theorems. What the tool guarantees is that, if we are able to prove all of 
these theorems, then the program is formally verified. Some of these theorems ensure that the 
execution will not cause exceptional behaviors: no accesses out of the bounds of the arrays, no 
overflow during computations, and so on. The other theorems ensure that the program satisfies 
all its annotations. 

At this point, we can run tools over the generated theorems, in the hope that they will au- 
tomatically find proofs of them. For instance, Gappa is suited for proving theorems stating that 
floating-point operations do not overflow or that their rounding error is bounded, while an SMT 
solver* will tackle theorems stating that arrays are never accessed out of their bounds. Unfor- 
tunately, the more complicated theorems require some user interaction, so we have used the Coq 
proof assistant to help us in writing their formal proofs. This is especially true for theorems that 
deal with the more mathematically-oriented aspect of the verification, e.g. the convergence of the 
numerical scheme. 

2.2.1 Coq 

Cocj^Jis a formal system that provides an expressive language to write mathematical definitions, ex- 
ecutable algorithms, and theorems, together with an interactive environment for proving them [7]. 
Coq's formal language combines both a higher-order logic* and a richly-typed functional program- 
ming language* |18j . In addition to functions and predicates, Coq allows to state mathematical 
theorems and software specifications*, and to interactively develop formal proofs of those. 

The architecture of Coq can be split into three parts. First, there is a relatively small kernel 
that is responsible for mechanically checking formal proofs. Given a theorem proved in Coq, one 
does not need to read and understand the proof to be sure that the theorem statement is correct, 
one just has to trust this kernel. For instance, Coq has been used to verify an EAL7-grad^] smart 
card [TS] . 

Second, Coq provides a proof development system so that the user does not have to write 
the low-level proofs that the kernel expects. There are some interactive proof methods (proof 
by induction, proof by contradiction, intermediate lemmas, and so on), some decision and semi- 
decision algorithms (e.g. proving the equality between polynomials), and a tactic* language for 
letting the user define its own proof methods. Note that all these high-level proof tools do not 
have to be trusted, since the kernel will check the low-level proofs they produce to ensure that all 
the theorems are properly proved. 

Third, Coq comes with a standard library, so that users do not have to start their proofs from 
scratch but can instead reuse well-known theorems that have already been formally proved be- 
forehand. This general-purpose library contains various developments and axiomatizations about 
sets, lists, sorting, arithmetic, real numbers, etc. In this work, we mainly use the Reals standard 
library [34], which is a classical axiomatization of an Archimedean ordered complete field. It pro- 
vides all the basic theorems about analysis, e.g. differentials, integrals. It does not contain more 
advanced topics such as Fourier transform and its properties though. Here is a short example for 
the irrationality of y/2: statement of the theorem and start of the proof. 

Definition irrational (x : R) : Prop := 

forall (p : Z) (q : nat), q ^ 0— > x^p/q. 



^http : //coq. inria. f r/ 

3 EAL stands for Evaluation Assurance Level, and 7 is the highest level of security defined by the Common 
Criteria standard (ISO 15408). As a point of comparison, the security level commonly met by smart cards is only 
EAL4. 
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Theorem i r rat i o n a I _sq rt _2 : irrational (sqrt 2). 
Proof. 

intros p q H HO. 

Qed. 



The standard library does not come with a formalization of floating-point numbers. For that 
purpose, we use a large Coq librarjj^] initially developed in [21] and extended with various results 
afterwards [5] . It is a high-level formalization of the IEEE-754 international standard for floating- 
point arithmetic. This formalization is convenient for human interactive proofs as testified by the 
numerous proofs using it. The huge number of lemmas available in the library (about 1400) makes 
it suitable for a large range of applications. This library has since then been superseded by the 
Flocq library j!5) and both libraries were used to prove the floating-point results of this work. 



2.2.2 Frama-C, Jessie, Why, and the SMT Solvers 



We use the Frama-C platforrrjj to perform formal verification of C programs at the source-code 
level. Frama-C is an extensible framework that combines static analyzers* for C programs, written 
as plug- ins, within a single tool. In this work, we use the Jessie plug- in for deductive verification* . 
C programs are annotated* with behavioral contracts written using the ANSI C Specification Lan- 
guage (ACSL for short) 5 . The Jessie plug-in translates them to the Jessie language [33], which 
is part of the Why verification platform [3S] . This part of the process is responsible for translating 
the semantics* of C into a set of Why logical definitions (to model C types, memory heap, etc.) 
and Why programs (to model C programs). Finally, the Why platform computes verification 
conditions* from these programs, using traditional techniques of weakest preconditions PU, and 
emits them to a wide set of existing theorem provers, ranging from interactive proof assistants* 
to automated theorem provers*. In this work, we use the Coq proof assistant (Section 2.2.1), 
SMT so lvers* Alt-Ergo [17], CVC3 [4] and Z3 23J, and the automated theorem prover Gappa 
(Section 2.2.31. Details about automated and interactive proofs can be found in Section 6.1 The 
dataflow from C source code to theorem provers can be depicted as follows: 



ACSL-annotated 
C program 



Frama-C 
(Jessie plug-in 




More precisely, to run the tools on a C program, we use a graphical interface called gWhy. A 
screenshot is displayed in Appendix[C] In this interface, we may call one prover on several goals*. 
We then get a graphical view of how many goals are proved and by which prover. 

In ACSL, annotations are written using first-order logic* . Following the programming by con- 
tract approach, the specifications involve preconditions, postconditions, and loop invariants* . The 
contract of the following function states that it computes the square of an integer x, or rather a 
lower bound on it. 



//@ requires x >— 0; 

//@ ensures \result * \result <= 

int squa re_root ( int x); 



http : //lipf orge . ens-lyon.fr/www/pff/ 
E http : //www. f rama-c . cea.fr/ 
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The precondition, introduced with requires, states that the argument x is nonnegative. When- 
ever this function is called, the toolchain will generate a theorem stating that the input is non- 
negative. The user then has to prove it to ensure the program is correct. The postcondition, 
introduced with ensures, states the property satisfied by the return value \result. An important 
point is that, in the specification, arithmetic operations are mathematical, not machine operations. 
In particular, the product \result * \result cannot overflow. Simply speaking, we can say that C 
integers are reflected within specifications as mathematical integers, in an obvious way. 

The translation of floating-point numbers is more subtle, since one needs to talk about both 
the value actually computed by an expression, and the ideal value that would have been computed 
if we had computers able to work on real numbers. For instance, the following excerpt from our 
C program specifies the relative error on the content of the dx variable, which represents the grid 



step Ax (see Section 3.2). The identifier dx represents the value actually computed (seen as a real 
number), while the expression \exact(dx) represents the value that would have been computed if 
mathematical operators had been used instead of floating-point operators. Note that Oxl.p-53 is 
a valid ACSL literal (and C too) meaning 1 • 2~ 53 (which is also the machine epsilon on binary64 
numbers). 



dx = 1 . / n i ; 




/*@ assert 




@ dx > 0. && dx <= 


0.5 && 


@ \abs(\exact(dx) - 


dx) / dx <= Oxl.p-53; 


@ */ 





2.2.3 Gappa 

The Gappa tooj^j adapts the interval- arithmetic* paradigm to the proof of properties that oc- 
cur when verifying numerical applications [20] . The inputs are logical formulas quantified over 
real numbers whose atoms are usually bounds on arithmetic expressions inside numeric intervals. 
Gappa answers whether it succeeded in verifying it. In order to support program verification, 
one can use rounding functions inside expressions. These unary operators take a real number 
and return the closest real number in a given direction that is representable in a given binary 
floating-point format. For instance, assuming that operator o rounds to the nearest binary64 
floating-point number, the following formula states that the relative error of the floating-point 
addition is bounded [2"5] : 

Vx, y£R, 3e € R, |erj < 2~ 53 and o(o(x) + o(y)) = (o(x) + o(y))(l + e). 

Converting straight-line numerical programs to Gappa logical formulas is easy and the user 
can provide additional hints if the tool were to fail to verify a property. The tool is specially 
designed to handle codes that are performing convoluted manipulations. For instance, it has been 
successfully used to verify a state-of-the-art library of correctly-rounded elementary functions [22] . 
In the current work, Gappa has been used to check much simpler properties. In particular, no 
user hint was needed to automatically prove them. Yet the length of their proofs would discourage 
even the most dedicated users if they were to be manually handled. One of the properties is the 



round-off error of a local evaluation of the numerical scheme (Section 5.1.1). Other properties 
mainly deal with proving that no exceptional behavior occurs while executing the program: due 
to the initial values, all the computed values arc sufficiently small to never cause overflow. 

The verification of some formulas requires reasonings that are so long and intricate [22] , that it 
might cast some doubts on whether an automatic tool actually succeeded in proving them. This 
is especially true when the tool ends up proving a property stronger than what the user expected. 
That is why Gappa also generates a formal proof that can be mechanically checked by a proof 
assistant. This feature has served as the basis for a Coq tactic* for automatically solving goals 
related to floating-point and real arithmetic [T3] ■ Note that Gappa itself is not verified, but since 
Coq verifies the proofs that Gappa generates, the goals are formally proved. 



t http : //gappa. gf orge . inria.fr/ 
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This tactic has been used whenever a verification condition would have been directly proved 
by Gappa, if not for some confusing notations or encodings of matrix elements. We just had to 
apply a few basic Coq tactics to put the goal into the proper form and then call the Gappa tactic 
to prove it automatically. 

3 Numerical Scheme for the Wave Equation 

The case of the numerical resolution of the one-dimensional acoustic wave equation using the 
second-order centered explicit scheme is chosen for its simplicity, and for its representativity of a 
wide class of scientific computing problems. We describe and state the different notions necessary 
for the implementation of the numerical resolution of the partial differential equation and its 
analysis. 

This section, as well as the steps taken at Section [4] to conduct the convergence proof of the 
numerical scheme, is inspired by [§]. 

3.1 Continuous Equation 

We consider a one-dimensional homogeneous acoustic medium VL = [a; m i n , x max ] characterized by 
the constant propagation velocity c. Let p(x, t) be the acoustic quantity, e.g. the transverse dis- 
placement of a vibrating string, or acoustic pressure. Let po(x) and p\{x) be the initial conditions. 
Let us consider homogeneous Dirichlet boundary conditions. 
The one-dimensional acoustic equation on Q writes 



(1) Vfc>0, VseQ, {L{c)p)(x,t) = ^{x,t)+A(c)p(x,t)=0, 

(2) V.eO, (L lP )(x,0)^^(x,0)=p 1 (x) ) 

(3) VxeO, {L oP )(x,0) d =p(x,0)=p Q (x) 1 

(4) \ft > 0, p(x min , t) = p(x max , t) = 

where the differential operator A(c) is defined by 

(5) A(c)^-c?^. 



We admit that under reasonable conditions on the Cauchy data po and p\ , there exists a unique 
solution p to the initial-boundary value problem Q-Q for each c > 0. Of course, it is well-known 
that the solution to this partial differential equation is given by d'Alembert's formula [3T]. But 
simply assuming the existence of a solution instead of exhibiting it opens the way to scale our 
approach to more general cases for which there is no known analytic expression of a solution, e.g. 
in the case of a nonuniform propagation velocity c. 

We associate the positive definite quadratic quantity 

(6) E{c)m) = \ 

where (q, r) = f J fl q(x)r(x)dx, \\q\\ 2 d = (q,q) and ||<z||^u c ) == f (A(c)q, q). The first term is inter- 
preted as the kinetic energy, and the second term as the potential energy, making E the mechanical 
energy of the acoustic system. 

Let po (resp. pi) represents the functions defined on the entire real axis R obtained by successive 
antisymmetric extensions in space respectively of po (resp. p±). For example, we have, for all 
x g [2x m i n — x max , x m in], for all t, p~Q{x,t) = — po(2x m in — x,t). The image theory [25] stipulates 
that the solution of the wave equation Q-Q coincides on the domain Q with the solution of the 
same wave equation but set on the entire real axis R, without the Dirichlet boundary condition Q, 
with extended Cauchy data po, p\. 



dp 
at 



(.,*) 



A{c) 
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3.2 Discrete Equations 

Let us consider the time interval [0, t max ]. Let i max (resp. fc max ) be the number of intervals of the 
space (resp. time) discretization. We defin^] 



(7) 
(8) 



Ax 



dcf x 



max *^mm 



clef 



A;> 



At 



def ^max 



t 

At 



The regular discrete grid approximating ft x [0, i max ] is defined by 



(9) 



Vfc G [0..fc max ], Vi G [0..w], xj d = (ai 4 , i fc ) d = (a; min + l Ax, fcAt). 



For a function g defined over ft x [0, i max ] (resp. ft), the notation gh (with a roman index h) 
denotes any discrete approximation of g at the points of the grid, i.e. a discrete function over 
[0..i max ] x [0..fc max ] (resp. [0..i max ]). By extension, the notation gh is also a shortcut to denote the 
matrix (gf)o<i<i max .o<fc<fc max (resp. the vector (gi)o<i<i max )- The notation g^ is reserved to the 
evaluation at the points of the grid, gf = f g(xf) (resp. fj = f q(xi)). 

Let uo,h and Ui t h be two discrete functions over [0..i max ]. Let Sh be a discrete function over 



[0.A 



[0..fc n 



Then, the discrete function ph over [0..i n 



x [0..fc max ] is said to be the 
solution of the second-order centered finite difference explicit scheme, when the following set of 
equations holds: 



(10) Vfc G [2..fc max ], Vi G [l..w - 1], 



(11) ViG [l.i max -l], 

(12) Vi G [l.i max - 1], 

(13) Vfce [0..fc max ], 



It f \ \ dcf Pi Pi i ^ (A ( \ \ 

{Li,h(c)Ph)i = — \- -^[A h {c)p h ) i = u u , 

(T \ dcf 

{L ^p h _) l = Pi = u ,i, 
Po = PL 







where the matrix Ah(c) (discrete analog of A(c)) is defined by 
(14) Vg h ,Vz G [l..i ma 



n / s \ dcf 2 g i+ i - 2gi + g^_i 
1], LA h (c)gh)i = -c ^ . 



Note the use of a second order approximation of the first derivative in time in ( 11 ) 
A discrete analog of the energy is also defined by 



(15) E h (c)(p h r + -z 
where, for any vectors gh and rh, 



fc+i dcf 1 



fe+1 _ k 
Ph Ph 



At 



An 



+ -ApIpI +1 ) Ah{c) 



(Qh,r h ] 



dcf 



Ax 



i=l 



(qh,r h ) Ah{c) d = (A h (c)g h ,r- h ) Ax , |kh||^ (c) = Ah(c) 



del' 



Note that ||-|| J 4 h ( c ) is a semi-norm. 

7 The floor notation [.J denotes the rounding to an integer towards minus infinity. 
8 For integers n and p, the notation [n..p] denotes the integer range [n,p] n N. 
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Note that the numerical scheme is parameterized by the discrete Cauchy data ito,h an d Ui,h> 
and by the discrete source term Sh- When these input data are respectively approximations of 
the continuous functions po, and pi {e.g. when ito.h = pbh; u i,h = Pih> an d s h = 0), the discrete 
solution ph is an approximation of the continuous solution p. 

The remark about image theory holds here too: we may replace the use of Dirichlet boundary 



conditions (13 1 by considering extended discrete Cauchy data po.in an d pi k- Note also that, as 



in the continuous case when a source term is considered, the discrete solution of the numerical 



scheme ( 10 H 13 ) can be obtained by the discrete space-time convolution of the discrete funda- 



mental solution and the discrete source term. This will be useful in sections 15.1.21 and 16.21 
3.3 Convergence 

Let £ be in ]0, 1[. The CFL(£) condition (for Courant-Friedrichs-Lewy, see [H]) states that the 
discretization steps satisfy the relation 



(16) 



cAt 
Ax 



<i-e 



The convergence error eh and the truncation error £h are defined by 



(17) 


Vfc € [0..fc max ], Vi e [0.i max ], e\ 


def 


P\~P\, 


(18) 


Vfc e [2..fc max ], Mi e [l..w - i], £\ 


def 


(L h (c)p h )f, 


(19) 


Vi e [l.imax - 1], sl 


def 


(£i,h(c)ph)i -Pi,i, 


(20) 


Vi e [l..w - 1], 4 


def 


(^0,hPh)i - Po,», 


(21) 


Vfc e [0..fc max ], £o=et ax 


def 


0. 



When the input data of the numerical scheme for the approximation of the continuous solution 
are given by ito.h = Poh; u i,h = Pihi an d s h = 0, the convergence error eh is itself solution of the 
same numerical scheme with discrete inputs corresponding to the truncation error: w ,h = £ h = 0j 

= e h ' and s h = (fc ^ £h +1 )- 
The numerical scheme is said to be convergent of order (m,n) uniformly on the interval [0, t max ] 
if the convergence error satisfieaj 



(22) 



fc At (t) 



A.T 



= O [0 ^ ] (Ax m + At n ). 



The numerical scheme is said to be consistent with the continuous problem at order (m, n) 
uniformly on interval [0,i max ] if the truncation error satisfies 



(23) 



A., 



= O [0 ^ ] (Ax m + At n ). 



The numerical scheme is said to be stable uniformly on interval [0, i max ] if the discrete solution 
of the problem without any source term satisfies 



(24) 3a,C u C 2 > 0, V< e [0,t max ], VAx, AO 0, V^x 2 + At 2 < a 

. < (Ci + C a t)(\\uo,h\\ Ax + lbo,h|! Ah(c) + \\ui, h \\ Ax ). 

Ax nK 1 



fcAt(t) 
Ph 



The result formally proved* at Section [4] states that if the continuous solution p is regular 
enough on 11 x [0, £ max ] a nd if the discretization steps satisfy the CFL(£) condition, then the 
second-order centered scheme is convergent of order (2, 2) uniformly on interval [0,i max ]. 

We do not admit (nor prove) the Lax equivalence theorem which stipulates that for a wide 
variety of problems and numerical schemes, consistency implies the equivalence between stability 



See Section 4.1 for the precise definition of the big O notation. The function k& t is defined in Q. 
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and convergence. Instead, we establish in the particular case of the one-dimensional acoustic wave 
equation that consistency and stability imply convergence. 

The Fourier transform is a very popular tool to study the convergence of numerical schemes. 
The formalization in Coq of Lebesgue integral theory and Fourier transform theory should not 
encounter any major difficulty, except for the human cost of such a development. Since there was 
no Coq support for these theories when we started this work (and there is still not), we decided 
to consider energy-based techniques, as they only involve finite summations (to compute discrete 
dot products). The energy-based approaches rely on a priori error estimations, so they are less 
precise; they can be applied to the heterogeneous case though, and to irregular approximation 
grids. 

3.4 C Program 



Listing 1: The main part of the C code, without annotations. 

/* Compute the constant coefficient of the stiffness matrix. */ 
al = dt/dx*v; 
a = al*al ; 

/* First initial condition and boundary conditions . */ 
/* Left boundary . */ 
P[0][0] = 0.; 

/* Time iteration —1 = space loop. */ 
for (i=l; i<ni ; i++) { 
p[ i ] [0] = p0( i *dx) ; 

} 

/* Right boundary . */ 
p[ni][0] = 0.; 

/* Second initial condition (with pl=0) and boundary conditions . */ 
/* Left boundary . */ 
p[0][l] = 0.; 

/* Time iteration = space loop. */ 
for (i=l; i<ni ; i++) { 

dp = p[i+l][0] - 2.*p[i][0] + p[i-l][0]; 

p[ i ] [1] = p [ i ] [0] + 0.5*a*dp; 

} 

/* Right boundary . */ 
p[ni][l] = 0.; 

/* Evolution problem and boundary conditions . */ 
/* Propagation — time loop. */ 
for (k=l; k<nk; k++) { 

/* Left boundary. */ 

p[0][k+l] = 0.; 

/* Time iteration k = space loop. */ 
for (i=l; i<ni ; i++) { 

dp = p[i+l][k] - 2.*p[i][k] + p[i-l][k]; 

p[i][k+l] = 2.*p[i][k] - p[ i ] [k— 1] + a*dp; 

} 

/* Right boundary . */ 
p[ni ] [k+1] = 0.; 

} 



We assume that a; m i n = 0, x max = 1, and that the continuous solution is bounded by 1. The 
main part of the C program is listed in Listing [T] The grid steps Ax and At are respectively 
represented by the variables dx and dt, the grid sizes i max and A: max by the variables ni and nk, 
and the propagation velocity c by the variable v. The initial value uo,h is represented by the 
function pO. The other input data Ux,h and Sh are supposed to be zero and are not represented. 
The discrete solution is represented by the two-dimensional array p of size (i max + l)(k max + 1). 
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Note that this is a simple implementation. A more efficient one would only store two time steps. 
3.5 Program Annotations 

The full annotations* are given in Appendix [Bj We give here some hints about how to specify this 
program. 

There are two axiomatics that define the logical formulas needed for the program annotations. 
The first one corresponds to the mathematics: the exact solution of the wave equation and its 
properties, ft defines the needed values (the exact solution p, and its initialization po). It also 
defines the derivatives of p {psoli, first derivative for the first variable of p, and psoln, second 
derivative for the first variable, and psoli and psol^i for the second variable). The value of the 
derivative of / is defined as the limit of f( x+h ^~f( x "> when h 0. As the ACSL annotations are 
only first order, these definitions arc quite cumbersome: each derivative needs 5 lines to be defined. 
Here is the example of psoli, that is to say f^: 



/*@ logic real psoLl(real x, real t); 




@ axiom psol _1 _def : 




@ \forall real x; \ forall real t; 




@ \forall real eps; \exists real C; < C &i 


It \ for all real dx; 


@ < eps => \abs(dx) < C => 




@ \abs(( psol (x + dx , t ) - psol(x, t)) / dx - 
@*/ 


- psol_l(x, t ) ) < eps; 



Note the different treatment of the positiveness for the existential variable C, and for the free 
variable eps. 

We also put as axioms the fact that the solution has the expected properties ([lJQ. The 
last property needed on the exact solution is its regularity. We require it to be near its Taylor 
approximations of degrees 3 and 4 on the whole interval [x m ; n , £ max ]. For instance, the following 
annotation states the property for degree 3. 

/*@ axiom psol _suff_regu I a r_3 : 
@ < alpha-3 && < C3 && 

@ \forall real x; \forall real t; \forall real dx; \forall real dt ; 

@ <= x <= 1 \sqrt(dx * dx + dt * dt) <= alpha_3 

@ \abs( psol (x + dx, t + dt) - psol _T aylor _3 (x , t, dx, dt)) <= 

@ C_3 * \abs(\pow(\sqrt(dx * dx + dt * dt), 3)); 

@*/ 



The second axiomatic corresponds to the properties and loop invariant* needed by the program. 
For example, we require the matrix to be separated. It means that a line of the matrix should not 
overlap with another line. Otherwise a modification would alter another point of the matrix. The 
predicate analytic_error that is used as a loop invariant is declared in the annotations and defined 
in the Coq files. 

The initialization functions are specified, but not stated. This corresponds firstly to the func- 
tion array2d_alloc that initializes the matrix and p_zero that produces an approximation of the po 
function. Our program verification is modular: our proofs are generic with respect to po and its 
implementation. 

The preconditions of the main function are as follows: 

• ! m ax and fc max must be greater than one, but small enough so that « max + 1 and fc max + I do 
not overflow; 

• the grid sizes Ax must be small enough to ensure the convergence of the scheme; 

• the floating-point values computed for the grid sizes must be near their mathematical values; 

• to prevent exceptional behavior in the computation of a, At must be greater than 2~ 1000 
and ^ must be greater than 2 - 500 . 

Ax ° 
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The last hypothesis gets very close to the underflow threshold: the smallest positive floating-point 
number is a subnormal of value 2~ 1074 . 

There are two postconditions, corresponding to a bound on the convergence error and a bound 
on round-off errors. See Sections l4l and |5~T1 for more details. 



4 Formal Proof of the Convergence of the Scheme 

We first present the notions necessary to formalize and prove the method error. The method 
error is the distance between the discrete value (with exact real arithmetic) and the continuous 
mathematical value; it reduces to the convergence error here. Formal specifications and proofs* 
often highlight mathematical details that are not visible in a pen-and-paper proof. So we detail 
how the proof is conducted: we establish the consistency and the stability, and prove that these 
two properties imply convergence in the case of the one-dimensional acoustic wave equation. 



4.1 Big O, Differentiability, and Regularity 

When considering a big O equality a — 0(b) (which must be understood as "a belongs to 0(b)"), 
one usually assumes that a and b are two expressions defined over the same domain and its 
interpretation as a quantified formula comes naturally. For Taylor approximations, the situation 
is a bit more different. Consider 

/(x,Ax) = 0(. 9 (Ax)) 

when || Ax|| goes to 0. If one were to assume that the equality holds for any x £ R 2 , one would 
interpret it as 

Vx,3a > 0,3C> 0,VAx, ||Ax|| < a =► |/(x, Ax)| < C ■ |ff(Ax)|, 

which means that constants a and C are in fact functions of x. Such an interpretation happens 
to be useless, since the infimum of a may well be zero while the supremum of C may be +oo. 

A proper interpretation requires the introduction of a uniform big O relation with respect to 
the additional variable x: 

(25) 3a > 0, 3C> 0, Vx e Oc, VAx e S1a x , ||Ax|| < a => |/(x, Ax)| < C ■ |<?(Ax)|. 

In Coq, this is written as follows, where Prop is the type of the logical propositions, normJ2 is the 
usual L 2 norm and Rabs the absolute value: 



Definition OuP (A : Type) (P : R * 


R - 


-> Prop) 


(f : A^ R * R^ R) 


(g 


: R * R^ R) : = 


exists alp : R, exists C : R, 


< 


alp A < C A 


forall X : A, forall dX : R * 


R, 


norm_l2 dX < alp -> P dX -> 


Rabs (f X dX) < C * Rabs 


(g 


dX). 



Here, type A stands for domain J7 X , and predicate P for domain f^Ax- 

To emphasize the dependency on the two subsets Q x and f^Axj uniform big O equalities are 
now written 

/(x, Ax)=On x ,fw(<7(Ax)). 

Now, we precisely define the notion of "sufficiently regular" functions in terms of the full- 
fledged notation for the big O. The further result on the convergence of the numerical scheme 
requires that the solution of the continuous equation is actually sufficiently regular. We can define 
the usual Taylor polynomial of order n of a function /: 

TP„(/.x, « (Ax, At) * £ A (± (») . g ^(x) - A,- - M*-A . 
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Let f2 x C K 2 . We say that the previous Taylor polynomial is a uniform approximation of order n 
of / on Q x when the following uniform big O equality holds: 

/(x + Ax) - TP„(/,x)(Ax) = 0x , R2 (|| Ax||" +1 ) . 

A function / is then said to be sufficiently regular of order n uniformly on f2 x when all its Taylor 
polynomials of order smaller than n are uniform approximations of / on Q x . 



4.2 Consistency 



We formally prove that, when the continuous solution of the wave equation (|T 
regular of order 4 uniformly on [a; m j n , x max ] x [0, £ max ], the numerical scheme (110 
with the continuous problem at order (2, 2) uniformly on interval [0, t max ] (see definition (23 1 
in Section 3.3 1. This is obtained using the properties of Taylor approximations; the proof is 



(4k is sufficiently 
( 13 ) is consistent 



straightforward while involving long and complex expressions. 

The key idea is to always manipulate uniform Taylor approximations that are valid in a compact 
subset that includes all points of all grids when the discretization steps goes down to zero. 



For instance, to take into account the initialization phase corresponding to Equation (11), we 



have to derive a uniform Taylor approximation of order 1 for the following continuous function 
(for any v sufficiently regular of order 3): 



((x,t),(Ax,At))^ 



v(x, t + At) - v(x, t) At 2 v(x + Ax, t) - 2v(x, t) + v(x - Ax, t) 



At 



Ax 2 



Note that the expression of this function involves both x + Ax and x — Ax, meaning that we 
need a Taylor approximation that is valid for both positive and negative variations. The formal 
proof would have been impossible if we had required < Ax (as a space grid step must be) in the 
definition of the Taylor approximation. In contrast with the case of an infinite string |llj , we do 



not need here a lower bound for c^. 

Aa: 



4.3 Stability 



For the proof of the round-off error (see Section 5.1), we need a statement of the same form as 



definition ( 24 ) of Section 3.3 Therefore, we formally prove that, under the CFL(£) condition ( 16 ) 



the numerical scheme ( |1Q[ ) — 
As stated in Section 13.3 



131) is stable uniformly on interval [0,t max ]- 
we use an energy-based technique to prove the convergence of the 



numerical scheme, hence we need to express the stability in terms of the discrete energy. More 



precisely, we formally prove that under the CFL(£) condition ( 16 1, the discrete energy ( 15 ) satisfies 
the following overestimation: 



(26) /SW^ < V^(c)(p h )3 + 



V2 



fc'=i 



A.r 



for all t G [0, i max ] and with k = — 1. 

The evolution of the discrete energy between two consecutive time steps is shown to be pro- 
portional to the source term. In particular, the energy is constant when the source is inactive. 
Then, we obtain the following underestimation of the discrete energy: 



Vfc, 



AT 

'Ax 



i 1 — ^ 



P? +1 -rf 
At 



< £ h (c)(p h ) fc+ *. 



Therefore, the nonnegativity of the discrete energy is directly related to the CFL(£) condition. 

Note that this stability result is valid for any Cauchy data ito,h, and ui hj so it may be subop- 
timal for specific choices. 
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4.4 Convergence 



We formally prove that, when the continuous solution of the wave equation ([lJ-Q is sufficiently 
regular of order 4 uniformly on [x m ; n , x max ] x [0,t max ], and under the CFL(£) condition (16), 



definition (22) in Section 3.3) 



the numerical scheme (10H13) is convergent of order (2, 2) uniformly on interval [0,t max ] (see 



Firstly, we prove that the convergence error is itself the discrete solution of a numerical 
scheme of the same form but with different input data. (Of course, there is no associated continuous 
problem.) In particular, the source term (on the right-hand side) is here the truncation error Sh 
associated with the initial numerical scheme for p^. Then, the previous stability result holds, and 
we have an overestimation of the square root of the discrete energy associated with the convergence 
error -Eh(c)(eh) that involves a sum of the corresponding source terms, i.e. the truncation error. 
Finally, the consistency result also makes this sum a big O of Ax 2 + At 2 , and a few more technical 
steps and a study of initializations conclude the proof. 



4.5 Difficulties 

Big O. In Section |4~T| we present two interpretations of the big O notation. Usual mathematical 
pen-and-paper proofs switch from one interpretation to the other depending on which one is the 
most adapted, without noticing that they may not be equivalent. The formal development was 
helpful in bringing into light the erroneous reasoning hidden by the usage of big O notations. We 
introduced the notion of uniform big O in |llj in the context of an infinite string. In the present 
paper, we consider the case of the finite string, hence for compactness reasons, both notions are 
in fact equivalent. However, we still use the more general uniform big O notion to share most of 
the proof developments between the finite and infinite cases. Regarding automation, a decision 
procedure* has been developed in 3J; unfortunately, those results were not applicable since we 
needed a uniform big O. 



Annotations describing differential equations. As long as we were studying only the method 
error, we did not have to define the differential operators nor assume anything about them |llj . 
Their only properties appeared through their usage: function p is a solution of the partial differen- 
tial equation and is sufficiently regular. This is no longer possible for the annotated* C program. 
We therefore need some total operator equal to the derivative when the function is diffcrentiable, 
which was not available in the standard Coq library. We previously added an axiom stating its 
existence; we recently removed it thanks to [14] . 

Differential equations introduce another issue: due to the underlying logic, the annotations 
have to define p as a solution of the PDE by using first-order formulas* stating differentiability, 
instead of second-order formulas involving differential operators. This makes the annotations 
especially tedious and verbose. 



Method error. The Coq proof of the method error is about 5000-line long. About half of 
it is dedicated to the wave equation, while the other half is re-usable: Hilbert space analysis, 
asymptotic analysis, Taylor approximations, and so on. We formally proved without any axiom 
that the numerical scheme is convergent of order 2 both in space and time, which is the known 
mathematical result. An interesting aspect of the formal proof in Coq is that we were able to 
extract the constants a and C appearing in the big O for the convergence result in order to obtain 
their precise values. The recursive extraction was fully automatic after making explicit some 
inlining. The mathematical expressions are given in Section [5.2| 
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5 Formal Proof of the C Program 
5.1 Round-off Error 

As is well-known, each operation on real numbers is implemented with IEEE-754 floating-point 
numbers [33 HH], so round-off errors will occur and may endanger the accuracy of the final results. 
On this program, naive forward error analysis* gives an error bound that is proportional to 2 fe 2 -53 
for the computation of a p\. If this bound was sensible, it would cause the numerical scheme to 
compute only noise after a few steps. Fortunately, round-off errors actually compensate themselves 
in this program. To take into account compensations and hence prove a usable error bound, we 
need a precise statement of the round-off error [10] to exhibit the cancellations made by the 
numerical scheme. 

Note that bounding the round-off errors and exhibiting the compensations is a novel idea. 
This is also very complex to prove, first on paper (several pages of error-prone computations with 
double summations) and then with Coq (with few automations and cumbersome notations). The 
formalizations of floating-point arithmetic we used are previous works \21\ I15j : they reflect the 
IEEE-754 standard and therefore handle both normal and very small numbers (called subnormal). 
Other floating-point special values (infinities, NaNs) are proved not to appear in the program. 



5.1.1 Local Round-off Errors 



Let 8f be the local floating-point error made for the computation of p\. For k = (resp. k = 1, 
and k > 2), the local error corresponds to the floating-point error made in line 9 of Listing [T] 
(resp. in lines 19-20, and in lines 32-33). To distinguish them from the discrete values of previous 
sections, we decorate the floating-point values as computed by the program with an underline. 
Quantities a and p k match the expressions a and p[i][k] in the annotations* , while a and p\ are 



represented by \exact(a) and \exact(p[i][k]), as described in Section 2.2.2 
The local floating-point error is defined as follows: 

Vfc€ [l..fc max -l],Vie[l..w-l], S k+1 = (2^-p^ 1 +a(^ +1 -2p j fc +^_ i )) 
V, € [l..w - 1], Si = (p° + a -{f i+1 - 2p° -pi 

vi€[i..w-i], a?=p?-j>9. 



Note that the program presented in Section 3.4 gives us that 



Vfc 6 [l..fc max - 1], Vi e [l..i max - 1], 

Vi € [Limax - 1], 
Vi 6 [l.imax - 1], 



„fc+l 



= fl 



p° = H(po(i*Ax))- 



where fl(-) means that all the arithmetic operations that appear between the parentheses are 
actually performed by floating-point arithmetic, hence a bit off. 

In order to get a bound on Sf, we need to know the range of p.. For the bound to be usable in 
our correctness proof, the range has to be [—2, 2]. We assume here that the continuous solution is 
bounded by 1; if not, we normalize by the maximum value and use the linearity of the problem. 
We have proved this range by induction on a simple triangle inequality taking advantage of the 
fact that at each point of the grid, the floating-point value computed by the program is the sum 
of the continuous solution, the method error, and the round-off error. 

To prove the bound on S k , we perform forward error analysis and then use interval arithmetic* 
to bound each intermediate error |20j . We prove that, for all i and k, we have \5 k \ < 78 • 2" 52 for 
a reasonable error bound for a, that is to say \a — a\ < 2~ 49 . 
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5.1.2 Global Round-off Error 

Let A k = p k — p k be the global floating-point error made for the computation of p k . The global 
floating-point error depends not only on the local floating-point error made for the same i and fc, 
but also on all the local floating-point errors inside the space-time dependency cone of apex (i,k). 

Indeed, from the linearity of the numerical scheme, the global floating-point error is itself 
solution of the same numerical scheme with discrete inputs corresponding to the local floating- 
point error: 

S 1 ( <5 fc+1 \ 

U0,h = <5° = 0, U hh = S h=(fc^^-j- 

Taking advantage on the remark about image theory made in Section |3.2[ we see the global 
floating-point error as the restriction to the domain D, of the solution of the same numerical scheme 



set on the entire real axis without the Dirichlet boundary condition (13), and with extended dis- 
crete inputs. Therefore, the expression of the global floating-point error is given by the convolution 
of the discrete fundamental solution on the entire real axis and the local floating-point error. 

Let us denote by Ah the discrete fundamental solution. It is solution of the same numerical 
scheme set on the entire real axis with null discrete inputs except u 10 = x?. Then, we can state 
the following result. (See [TU] for a direct proof that does not follow the above remarks.) 

Theorem 1. 

k I 

Vfc > 0, V* G [0. i max ] , A* = £ Yl ~ S i-d X \ +K 

1=0 j=-l 

Note that, for all k, we have Aq = A^ = 0. Note also that X k vanishes as soon as \i\ > k, 
hence the sum over j could be replaced by an infinite sum over all integers. 

5.1.3 Bound on the Global Round-off Error 

The analytic expression of A^ can be used to obtain a bound on the round-off error. We will need 
two lemmas for this purpose. 

Lemma 1. 

Vfc > 0, a k = f ]T \* = k. 

2— — oo 

Proof. The sum Oh satisfies the following linear recurrence: for all k > 1, a k+1 — 2a k + a k ~ 1 = 0. 
Since <r° = 0, and a 1 = 1, we have, for all k > 0, a k — k. □ 

Lemma 2. 

Vfc > 0, Vi, X k > 0. 

The sketch of the proof is given in Appendix [D] It uses complex algebraic results (the positivity 
of sums of Jacobi polynomials), and was not formalized in Coq. 

Theorem 2. 

Vfc >0, Vie [0..w], K fc | < 78-2" 53 (fc + l)(fc + 2). 



5.1.1 



Proof. According to Theoremjlj A k is equal to J2i=o J2j=-i M-j- We- know from Section 
that for all j and I, \M\ < 78 • 2~ 52 , and from Lemma [l] that J2 = i + 1- Since the A's are 
nonnegative (LemmaK2J), the error is easily bounded by 78 • 2~ 52 J2i=o(^ + ■'■)• ^ 

Except for Lemma [2j all the proofs described in this section have been machine-checked using 
Coq. In particular, the proof of the bound on 5 k was done automatically by calling Gappa from 
Coq. Lemma [2] is a technical detail compared to the rest of our work, that is not worth the 
immense Coq development it would require: keen results on integrals but also definitions and 



results about the Legendre, Laguerre, Chebychev, and Jacobi polynomials (see Section 6.2). 
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5.2 Total Error 



Let £h be the total error. It is the sum of the method error (or convergence error) e^ of Sections 3.3 



and 4.4 and of the round-off error A^ of Section 5.1 

From Theorem [2] we can estimate^] the following upper bound for the spatial norm of the 
round-off error when Ace < 1 and At < f ma x/2: for all t £ [0,t max ], 





\% ^ A 



fe At (t) 



A.r 



E 





f A k At (t)y 



Ax 



l)Ax -78-2 



-53 



At 



At 



1 • 78 • 2" 53 • 3 



At 2 



Thus, from the triangular inequality for the spatial norm, we obtain the following estimation 
of the total error: 



(27) Vt £ [0,t max ], VAx, ||Ax|| < mm(a e ,a A ) 



[i ^ £, 



<C e (Ar 2 +At 2 ) + -£| 

Ax At 2 



where the method error constants a e and C e were extracted from the Coq proof (see Section 4.4 1 
and are given in terms of the constants for the Taylor approximation of the exact solution at 
degree 3 (03 and C3), and at degree 4 (a^ and C4) by 



(28) 
(29) 



a e = min(l,i max ,a 3 ,a 4 ), 



a_ 

V2 



l)C" 



f , C = max(l,C 3 + c 2 C 4 + 1), and C" = max(C",2(l 



with a = ^ 

round-off constants a a and Ca, as explained above, are given by 



c 2 )Q) 



and where the 



(30) 
(31) 



a A = min(l,i max /2), 
C A = 234-2~ 53 t 2 



1. 



Of course, decreasing the size of the grid step decreases the method error, but in the same 
time, it increases the round-off error. Therefore, there exists a minimum for the upper bound 
on the total error, corresponding to optimal grid step sizes that may be determined using above 
formulas. 

On specific examples, one observes that this upper bound on the total error can be highly 
overestimated when £ is small. The main reason is the [x 2 factor in the expression of C e , which 
is asymptotically equivalent to l/£ for small values of £ (see Section 6.2). Nevertheless, one also 
observes that the asymptotic behavior of the upper bound of the total error for high values of the 
grid steps is close to the asymptotic behavior of the effective total error [T2] . 



6 Discussion 

6.1 Automation and Manual Proofs 

Given the program code, the Frama-C/Jessie/Why tools generate 150 verification conditions* that 
have to be proved. While possible, proving all of them in Coq would be rather tedious. Indeed, 

10 When r > 2, we have (r + 1)(t + 2) < 3t 2 . 
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while Coq provides a few automations, it is more of a proof checker, so the proofs end up being 
mostly written by the user. Moreover, a systematic usage of Coq would lead to a rather fragile 
construct: any later modification to the program, however small it is, would cause different proof 
obligations to be generated, which would then require additional human interaction to adapt the 
Coq proofs. We prefer to have automated theorem provers* (SMT solvers* and Gappa) prove as 
many of them as possible, so that only the most intricate ones are left to be proven in Coq. The 
following table in Figure [I] shows how many goals* are proved automatically and how many are 
left to the userFH 



Prover 


Proved Behavior VC 


Proved Safety VC 


Total 


Alt-Ergo 


18 


80 


98 


CVC3 


18 


89 


107 


Gappa 


2 


20 


22 


Z3 


21 


63 


84 


Automatically proved 


23 


94 


117 


Coq 


21 


12 


32 


Total 


44 


106 


150 



Figure 1: Number and type of proved verification conditions (VC) depending on the prover. 

Safety goals are the ones that are always generated, even in the absence of any specification* . 
Proving them ensures that the program terminates without any error when executed. They check 
that matrix accesses are in range, that the loop variants decrease and are nonnegative (thus loops 
terminate), that integer and floating-point arithmetic operations do not overflow, and so on. On 
such goals, automatic provers are helpful: they prove about 90 % of the goals. 

Behavior goals are the ones that relate a program to its specification. Proving them ensures 
that, if the program terminates without error, then it returns the specified result. They check 
that loop invariants* are preserved, that assertions hold, that preconditions hold before function 
calls, that postconditions are implied by preconditions, and so on. Automatic provers are a bit less 
helpful, as they fail for half of the goals. Some of these failed goals depend on an uninterpreted 
predicate, so automatic provers are missing crucial information. But even if that information was 
provided, the goals would still be too complicated for them, since they use mathematical theories 
out of their scope. That is why we resort to an interactive higher-order theorem prover* , namely 
Coq. 

Coq proofs are split into two sets: first, the mathematical proof of convergence and second, 
the proofs of bounded round-off errors and absence of runtime errors. 



Type of proofs 


Nb statement lines 


Nb proof lines 


Convergence 


999 


4108 


Round-off + runtime errors 


7 776 


12 336 



Note that most proof statements regarding round-off and runtime errors are automatically 
generated (7 289 lines out of 7 776) by the Frama-C/ Jessie/ Why framework. 

6.2 Applied Mathematicians and Formal Proof of Programs 
6.2.1 Method Error 

From the computational scientist point of view, the big surprise of a formal proof* development 
is the requirement to start by writing an extremely detailed conventional pen-and-paper proof. 
Indeed, a proof assistant such as Coq is not able to elaborate a sketch of proof nor to automatically 
find a proof of a lemma (but for the most trivial facts). Therefore, a formal proof is completely 

11 Note that verification conditions might be proved by one or several automated provers, and that no single 
prover is able to prove all the automatically proved conditions. 
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human driven, up to the utmost detail. Fortunately, we have not to specify all the mathematical 
facts since Coq provides a collection of known facts and theories that can be reused to establish new 
results. However, classical results from Hilbertian analysis, Taylor approximations, and asymptotic 
comparison of functions, have not yet been proved in Coq; thus, we had to develop them, hence 
also in the detailed pen-and-paper proof. In the end, the detailed pen-and-paper proof is about 
50-page long. Surprisingly, it is roughly the same size as the complete Coq formal proof (both 
source files have approximately 5,000 lines and 200,000 characters). 

The other unexpected fact is the necessity to precisely state the definition of the big O notion 
in the context of Taylor expansions and discrete grids. This leads to focus on the question of 
uniformity of the existential constants, and use a uniform version of the big O notion in the 
proofs. This point does not seem to be dealt with in the literature; there is an informal mention 
in |37j with a relevant remark about pointwise consistency versus norm consistency. 

With pen-and-paper proofs, it is uncommon to expand the constants on the upper bound of 
the method error. In contrast, Coq can automatically extract the constants from the formal proof. 
On specific examples, we can see that they are far from being optimal. A priori error estimations 
provided by energy-based techniques are known to be less precise as the CFL condition becomes 
optimal. Indeed, the factor \i = ^ in the expression of the overestimation of the discrete 

energy (26) grows to +oo when £ goes down to + . This makes the constant C e in (29) grow as l/£ 
for small values of £ . The alternative to obtain more accurate bounds is known: use a leapfrog 
scheme for the equivalent first-order system. 



6.2.2 Round-off Error 

In numerical analysis, it is common to evaluate bounds for the norm of quantities. To obtain 
an upper bound on the round-off error, we needed a result about the sign of the fundamental 
solution of the discrete scheme, not about a bound for its absolute value. To our knowledge, the 
only way to capture such a result is purely algebraic: the closed-form expression of the solution is 
first obtained using a generating function, then it may be recognized as a combination of Jacobi 
polynomials that happens to be nonnegative (see Appendix |d|). 

Such a result about the sign of the discrete fundamental solution may not hold for other 
numerical schemes. In our case, just assuming a bound on the absolute value of the discrete 
fundamental solution would lead to a slightly worse bound on the global round-off error |A^| < 
156-2- 52 (fc + I) 2 . 

More generally, a numeric scheme is said stable if the values do not diverge. This typically 
means that, given initial values (such as the values and derivative(s) at time 0), the values of the 
numerical scheme do not diverge. In floating-point arithmetic, an algorithm is said numerically 
stable if it has a small forward error: a small error on the input gives a small error on the output. 
This is of course the kind of algorithm we are looking for. Computational scientists believe (as a 
rule of thumb) that stable schemes are numerically stable. This is a belief funded by experiments 
and examples. We think this fact to be true. To prove it, we would need to formalize the notion 
of numerical scheme, the property of numerical scheme stability, and then deduce the numerical 
stability of any stable numerical scheme. This result should be generic enough to be applied to 
a large class of numerical schemes. It should also state a good enough bound to guarantee a 
reasonable accuracy. 



6.3 Future Work 

Another way to explore is the mechanism of program extraction. Instead of proving an existing 
program, one formalizes a problem, proves it has a solution, and obtains for free an OCaml 
program from the Coq proof terms. This program is usually not efficient, but it is a zero-defect 
program, provided that it is not modified by hand. In our case, efficiency is not the only issue; 
the extracted program would also be hindered by the omnipresence of classical real numbers in 
our formalization. So being able to extract efficient and usable zero-defect programs would be an 
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interesting long term goal for this kind of critical numerical problems. 

For this exploratory work, we considered the simple three-point scheme for the one-dimen- 
sional wave equation. Further works involve scaling to higher-dimension. The one-dimensional 
case showed us that summations and finite support functions play a much more important role in 
the development than we first expected. We are therefore moving to the SSReflect interface and 
libraries for Coq [8], so as to simplify the manipulations of these objects in the higher-dimensional 
case. 

Another perspective is to generalize our approach to other PDEs and ODEs. However, the 
proofs of Section [4] are entangled with particulars of the presented problem, and would therefore 
have to be redone for other problems. Our basic definitions have been tested and we think we 
have encountered most of the hurdles met in the formalization and proof of numerical analysis 
programs. In particular, we intend to generalize our method for bounding the round-off error 
using convolution with the discrete fundamental solution. 

A more ambitious perspective is to formally deal with the finite element method. This first 
requires a Coq formalization of mathematical tools from Hilbertian analysis on Sobolev spaces 
such as the Lax-Milgram theorem (for existence and uniqueness of continuous and approximated 
solutions), and the handling of meshes (that may be regular or not, and structured or not). Then, 
the finite element method may be proved convergent in order to formally guarantee a large class 
of numerical analysis programs that solve linear PDEs on complex geometries. The goal is to go 
beyond Laplace's equation set on the unit square: e.g. handle mixed boundary conditions, extend 
to mixed and mixed hybrid finite element methods. 

7 Conclusion 

We have presented a comprehensive formal proof of a C program solving the ID acoustic wave 
equation using the second-order centered finite difference explicit scheme. Our proof includes 

• Safety: We prove that the C program terminates and is free of runtime failure such as 
division by zero, array access out of bounds, null pointer dereference, or arithmetic overflow. 
The latter includes both integer and floating-point overflows. 

• Method error: We show that the numerical scheme is convergent of order (2, 2) uniformly, 
under the assumptions that the continuous solution is sufficiently regular of order 4 uniformly 
and that the Courant-Friedrichs-Lewy condition holds. 

• Round-off error: We bound all round-off errors resulting from floating-point computations 
used in the program. In particular, we show how some round-off errors compensate, to get 
a meaningful bound at the end. 

• Total error: Altogether, we are able to provide explicit (and formally proved) bounds for 
the sum of method and round-off errors. 

To our knowledge, this is the first time such a comprehensive proof is achieved. The total size of 
that formal proof is huge, if not frightening: several man-months of work, more annotations to 
be inserted in the C program than lines of code (see appendix |B|) , over 16,000 lines of Coq proof 
scripts, 30 minutes of CPU time to check them. 

The recent introduction of formal methods in DO-178C F^l shows the need for the verification 
of numerical programs in the context of embedded critical software. Considering the work we 
have presented, one can hardly think of verifying numerical codes on the scale of a large airborne 
system. Yet we think our techniques and a large subset of our proof can be reused and would 
significantly decrease the workload of such a proof. This is to be combined with increased proof 
automation, so that user interaction is minimized. Finally, the challenge is to provide tools that 
are usable by computational scientists that are not specialists of formal methods. 

12 DO-178C is the newest version of Software Considerations in Airborne Systems and Equipment Certification, 
which is used by national certification authorities. 
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In conclusion, combining scientific computing and formal proofs is a hot topic that logicians now 
consider important. Formal tools for computational science are actively developed and progress is 
done to get them mature and usable for non specialists. It seems to be the time for computational 
scientists to take a keen interest in this area. 
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A Glossary 

This section gives the definition of concepts used in mathematical logic and computer science that 
appear in this paper. 

annotation a comment added to the C code to specify the logical properties of the program. 
Tools turn them into verification conditions*. 

automated theorem prover a software tool that automatically proves logical properties. It 
may fail to find a proof, even for valid properties. 

decision procedure an algorithm dedicated to proving specific properties. This is one of the 
basic blocks of theorem provers. 

deductive verification process of verifying, with the help of theorem provers, that a program 
satisfies its specification*. 

first-order logic formal language of logical formulas that use quantifications over values only, 
and not over predicates and functions. 

formal proof a finite sequence of deduction steps which are checked by a computer. 

forward error analysis process of propagating error bounds from inputs to outputs of functions. 

functional programming language programming paradigm that treats computation as math- 
ematical evaluation of functions and considers functions as ordinary values. 
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goal logical property to formally prove*. The conclusion of a theorem is the initial goal. 

higher-order logic formal language of logical formulas that use quantifications over values, pred- 
icates and functions. 

inference rule generic way of drawing a valid conclusion based on the form of hypotheses. Each 
deduction step of a formal proof* must satisfy one of the inference rules of the logical system. 

interactive proof assistant a software tool to assist with the development of formal proofs* by 
human-machine collaboration. 

interval arithmetic arithmetic that operates on sets of values (typically intervals) instead of 
values. 

loop invariant logical formula about the state of a program, that is valid before entering a loop 
and remains valid at the end of each iteration of the loop. 

semantics the meaning associated to each syntactic construct of a language. 

SMT solver a variety of automated theorem prover* combining a SAT solver (propositional 
logic), equality reasoning, decision procedures (e.g., for linear arithmetic), and quantifier 
instantiation. 

specification description of the expected behavior of a program, 
static analysis analysis of a program without executing it. 

tactic command for an interactive proof assistant* to transform the current goal* into one or 
more goals that imply it. 

validation process of guaranteeing the correctness of a program by experiments such as tests. 

verification process of guaranteeing the correctness of a program by mathematical means such 
as formal proofs*. 

verification conditions goals that need to be proved to guarantee the adequacy between the 
program and its specification*. 

B Fully Annotated Source Code 



/*@ axiomatic d i rich let.m aths { 
@ 

<§ logic real c; 

<§ logic real pO(real x); 

<§ logic real psol(real x, real t); 

<§ axiom c_pos : < c; 

@ logic real psol_l( real x, real t); 

@ axiom psol .1 -def : 

@ \forall real x; \ forall real t; 

@ \forall real eps; \exists real C; 0< C&&\forall real dx; 
@ < eps => \abs(dx) < C 

@ \abs((psol(x + dx, t) - psol(x, t)) / dx - psol _1 (x , t))<eps; 

<§ logic real psoLll ( real x, real t); 

@ axiom psoLll-def : 

@ \forall real x; \ forall real t; 

<§ \forall real eps; \exists real C; 0< C&&\forall real dx; 
@ < eps => \abs(dx) < C => 

@ \abs(( psol.l (x + dx, t ) - psol.l(x, t)) / dx - psol.ll(x, t))< eps; 
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@ logic real psoL2(real x, real t); 

@ axiom psol .2 _def : 

@ \forall real x; \ forall real t; 

<§ \forall real eps; \exists real C; 0< C&&\forall real dt; 
@ < eps =S> \abs(dt) < C => 

@ \abs((psol(x, t + dt) - psol(x, t)) / dt - psoL2(x, t)) < eps; 

IS logic real psoL22( real x, real t); 

@ axiom psol _22_def : 

@ \forall real x; \ forall real t; 

@ \forall real eps; \exists real C; 0< C &&\forall real dt ; 
@ < eps => \abs(dt) < C => 

@ \abs((psoL2(x, t + dt) - psoL2(x, t)) / dt - psoL22(x, t)) < eps; 

@ axiom wave_eq_0: \ forall real x; <— x <= 1 psol(x, 0) = pO(x); 

@ axiom wave_eq_l: \ forall real x; <= x <= 1 psol_2(x, 0) = 0; 

@ axiom wave-eq-2 : 

@ \forall real x; \ forall real t; 

<3 <= x <= 1 psoL22(x, t) - c * c * psol.ll(x, t) = 0; 

@ axiom wave_eq_dirichlet_l : \ forall real t; psol(0, t) = 0; 
@ axiom wave_eq_dirichlet_2 : \forall real t; psol(l, t) = 0; 

<§ logic real psol _T 'aylor _3 ( real x, real t, real dx , real dt); 
<§ logic real psoLTaylor_4 ( rea I x, real t, real dx , real dt); 

@ logic real alpha. 3; logic real C.3; 
@ logic real alpha_4; logic real C-4; 

@ axiom psoLsuff_regular_3 : 
<3 < alpha.3 && < C.3 && 

@ \forall real x; \ forall real t; \ forall real dx; \ forall real dt; 

@ <= x <= 1 =^> \sqrt(dx * dx + dt * dt) <= alpha_3 

@ \abs(psol(x + dx, t + dt) - psoLT aylor.3 (x , t, dx, dt)) <= 

@ C.3 * \abs(\pow(\sqrt(dx * dx + dt * dt), 3)); 

@ axiom psol _suff ^regular _4 : 
@ < alpha.4 && < C.4 && 

@ \forall real x; \ forall real t; \ forall real dx; \ forall real dt; 

@ <= x <= 1 =^> \sqrt(dx * dx + dt * dt) <= alpha_4 

@ \abs( psol (x + dx, t + dt) - psol. Taylor. 4 (x , t, dx , dt)) <= 

@ C.4 * \abs(\pow(\sqrt(dx * dx + dt * dt), 4)); 

@ axiom psol.le : 

@ \forall real x; \ forall real t; 

@ <= x <= 1 => <= t \abs(psol (x, t)) <= 1; 

@ logic real T_max; 

@ axiom T.max.pos: < T.max; 

@ logic real C.conv; logic real alpha.conv ; 
@ lemma alpha.conv _pos : 0< alpha.conv; 
@ 

@ } */ 



/*@ axiomatic dirichlet.prog { 
@ 

@ predicate a n a I yt ic _e rro r{L} 

@ (double **p , integer ni , integer i , integer k, double a, double dt) 

@ reads p[. .][. .]; 

@ 

@ lemma a n a I yt ic _e r ro r.l e{L} : 

@ \forall double **p; \ forall integer ni; \ forall integer i; 

@ \forall integer nk; \ forall integer k; 

<§ \forall double a; \ forall double dt ; 

<3 < ni ==> <= / <= ni => <= k 
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< \exact(dt) => 

a n a lyti c_error (p , ni , i , k, a, dt) => 

\sqrt(l. / (ni * n i ) + \exact ( dt ) * \exact ( dt ) ) < alpha_conv =S 
k <— nk => nk <= 7598581 nk * \exact(dt) <= T_max => 

\exact(dt) * ni * c <= 1 — 0xl.p—50^> 
\forall integer il; \ forall integer kl ; 
<= il <= ni <= kl < k 
\abs(p[il][kl]) <= 2; 

predicate sepa rated _matrix{L} (double **p, integer leni) = 
\forall integer i; \ forall integer j; 
<— i < leni =^> <= j < leni =^> i != j 
\base_addr(p[ i ]) != \base_addr(p[j ] ) ; 

logic real sqr_norm_dx_conv_err{L} 

(double **p, real dx, real dt , integer ni , integer i , integer k) 

reads p [. .][. .] ; 
logic real sqr(real x) — x * x; 
lemma sq r_norm _dx_conv_err_0{L} : 

\forall double **p; \ forall real dx; \ forall real dt; 

\forall integer ni; \forall integer k; 

sqr.norm _dx_conv_err (p , dx, dt , ni , 0, k) = 0; 
lemma sq r_norm _dx_conv_err_succ{L} : 

\forall double **p; \ forall real dx; \ forall real dt; 

\forall integer ni; \forall integer i; \forall integer k; 

<= / => 

sqr.norm-dx-conv-err (p , dx, dt , ni , i + 1, k) = 
sq r_norm_dx_conv_err (p , dx, dt , ni , i, k) + 
dx * sqr (psol (1 . * i / ni , k * dt) — \exact ( p [ i ] [ k] ) ) ; 
logic real norm_dx_conv_err{L} 

(double **p, real dt , integer ni , integer k) = 

\sqrt (sqr_norm_dx_conv_err (p , 1. / ni , dt , ni , ni , k)); 

} */ 



/*<§ requires leni >= 1 &&c lenj >= 1; 

@ ensures 

@ \valid_range(\result , 0, leni - 1 ) && 

@ (\forall integer i; <= i < leni 

@ \valid-range(\result [ i ] , 0, lenj — 1)) 

@ separated_matrix(\result , leni); 

@ */ 

double ** a rray2d _a I loc ( int leni, int lenj); 



requires (I != 0) && \round .error (x) <= 5./2*0xl . p—53; 
<§ ensures 

@ \round_error(\result) <= 14 * 0xl.p-52Mi 
@ \exact(\result) = pO(\exact (x) ) ; 
<3 */ 

double p_zero(double xs , double I, double x); 



/*@ requires 

@ ni >— 2 &&> nk >= 2 && I .'= && 

@ dt > 0. && \exact(dt) > 0. && 

@ \exact(v) = c &&. \exact(v) = v && 

@ 0x1. p- 1000 <- \exact(dt) && 

@ ni <= 2147483646 && nk <= 7598581 && 

@ nk * \exact(dt) <= T_max && 

@ \abs(\exact(dt) - dt) / dt <= Oxl.p-51 && 

<3 0x1 .p-500 <- \exact(dt) * ni * c <= 1 - 0xl.p-50&& 

@ \sqrt(l. / (ni * ni) + \exact(dt) * \exact(dt)) < alpha_conv; 

<§ 

<§ ensures 

@ \forall integer i; \forall integer k; 
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@ <= i <= ni =^> <= k <= nk =^> 

@ \round_error(\result [ i ] [k]) <= 78. / 2 * Oxl.p-52 * (k + 1) * (k + 2); 
<3 

<§ ensures 

@ \forall integer k; <= k <= nk => 

<§ norm_dx_conv_err(\result , \exact(dt), ni , k) <= 

@ C_conv * (1. / (ni * ni) + \exact(dt) * \exact(dt)); 

@ */ 

double **forward_prop( int ni , int nk, double dt , double v, 
double xs , double I ) { 

/* Output variable. */ 
double **p; 

/* Local variables . */ 
int i , k ; 

double al , a, dp, dx; 

dx = 1 . / n i ; 
/*@ assert 

@ dx > 0. && dx <= 0.5 && 

@ \abs(\exact(dx) - dx) / dx <= 0x1. p- 53; 

@ */ 

/* Compute the constant coefficient of the stiffness matrix. */ 
al = dt/dx*v; 
a = al*al ; 
/*@ assert 

@ <= a <= 1 &&l 

@ < \exact(a) <= 1 &&c 

@ \round_error (a) <= 0x1 . p—49; 

@ */ 

/* Allocate space— time variable for the discrete solution. */ 
p= a rray2d _a I loc ( n i +1, nk+1); 

/* First initial condition and boundary conditions . */ 
/* Left boundary . */ 
p[0][0] = 0.; 

/* Time iteration —1 = space loop. */ 
/*@ loop invariant 
<S 1<= i <= ni && 

@ analytic-error (p , ni , i — 1, 0, a, dt); 
@ loop variant ni — i; */ 
for (i=l; i<ni ; i++) { 

p[i][0] = p_zero(xs, I, i*dx); 

} 

/* Right boundary. */ 
p[ni][0] = 0.; 

/*@ assert a nalyti c.error (p , ni , ni , 0, a, dt); */ 

/* Second initial condition (with p_one~0) and boundary conditions . */ 
/* Left boundary . */ 
p[0][l] = 0.; 

/* Time iteration = space loop. */ 
/*@ loop invariant 
@ 1<= i <= ni Mi 

@ a n a lytic-error (p , ni , i — 1, 1, a, dt); 
@ loop variant ni — i; */ 
for ( i=l; i<ni ; i++) { 

/*@ assert \abs ( p[ i - 1] [0] ) <= 2; */ 
/*@ assert \abs(p[ i ] [0] ) <= 2; */ 
/*@ assert \abs(p[ i +1][0]) <- 2; */ 
dp = p[i+l][0] - 2.*p[i][0] + P [i -1][0]; 
p[ i ] [1] = p[ i ] [0] + 0.5*a*dp; 

} 

/* Right boundary. */ 
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p[ni][l] = 0.; 

/*@ assert a n a lyti c_error (p , ni , ni , 1, a, dt); */ 

/* Evolution problem and boundary conditions . */ 
/* Propagation = time loop. */ 
/*@ loop invariant 
@ 1 <= k <= nk && 

@ a nalytic_error (p , ni , ni , k, a, dt); 
@ loop variant nk — k; */ 
for (k=l; k<nk; k++) { 
/* Left boundary. */ 
p[0][k+l] = 0.; 

/* Time iteration k = space loop. */ 
/*@ loop invariant 
@ 1<= i <= ni Mi 

@ a n a lyti c_error (p , ni , i — 1, k + 1 , a, dt); 
@ loop variant ni — i ; */ 
for ( i=l; i<ni ; i++) { 

/*@ assert \abs(p[ i - l][k] ) <= 2; */ 

/*@ assert \abs(p [ i ] [ k] ) <= 2; */ 

/*<§ assert \abs(p [ i +1] [k] ) <= 2; */ 

/*@ assert \abs( p [ i ] [ k- 1]) <= 2; */ 

dp = p[i+l][k] - 2.*p[i][k] + p[ i l][k] ; 

p[i][k+l] = 2.*p[i][k] - p[i][k-l] + a*dp; 

} 

/* Right boundary. */ 
p[ni][k+l] = 0.; 

/*@ assert a n a lytic _er ror (p , ni , ni , k + 1, a, dt); */ 

} 

return p; 



C Screenshot 



This is a screenshot of gWhy: we have the list of all the verification conditions and their proof 
status with respect to the various automatic tools. 
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/* Local variables. '/ 
double alj a, dp, dx; 

dx = l./ni; 

assert 

@ dx > 0. &fi dx <= B.5 &£, 



\abs(\e: 

V 



t(dx) - dx) / dx <= 0xl.p-53; 



/* Compute the c 

al = dt/dx*v; 

a = al*al; 

/*@ assert 

@ 9 <= a <= 1 6& 

@ 9c \exact(a) <= 1 55 

@ \round_error(a) <- Oxl.p-49; 



f Allocate space-t. 
p = array2d_alloc ( n. 



: of the stiffni 
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D Fundamental Solution and Jacobi Polynomials 

Let A be the sequence defined in Section [5.1.2| Note that adding zero initial values for the fictitious 
time step k — — 1 makes this sequence be a time shift by one of the fundamental solution of the 
discrete acoustic wave equation, associated with the input data Sh = 0, uo,h = 0, for all i, i ^ 0, 
u\ t i — 0, and Ui r o = 1- The items of the sequence satisfy the following equations: 

(32) Vi, 

(33) V?: ^ 0, 

(34) Vi,Vfc>0, 

We want to prove Lemma [2] i.e. that for all i,k, k > 0, we have A^" > 0. 

The proof is highly indebted to computer algebra: we use a generating function to obtain a 
closed-form expression for the A's, the creative telescoping method of Zeilberger [36] to express 
those A's in terms of Jacobi polynomials, and finally a result by Askey and Gasper [2] to ensure 
the nonnegativity. We have not mechanically checked this proof. For example, the Askey and 
Gasper result would have required enormous Coq developments, but parts of it could have been 
formalized, in particular Zeilberger's algorithm provides a certificate that allows an easy validation 
of its result. 

Consider the associated bivariate generating function formally defined by 



K = 0, 

A" = 0, Aq = 1, 

-\ fc+1 = a(A?_i + ALJ + 2(1 - a)A* - A*" 1 . 



A(X,T) = J2 E >H xiTk - 

i k>-l 
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The above recurrence relation (34) rewrites 



i k>0 \ i k>0 i k>0 j 

+2(1 - a) E X i Xi T k " E E A? -1 ** 2 *- 

i k>Q i k>0 

Since coefficients for fe's smaller than 1 are almost all equal to zero, we formally have 
r5^5^Aj +1 X i T* = A(X,T)-1, J2J2 X i-i XiTk = XA (X>T), 

i k>0 i k>0 

>H+i xl T k = A(JC, T), ^ ^ Aj X'T* = A(X, T), 

i fc>0 i fc>0 

£ ^ X^ 1 X l T k = TA(X, T). 

i k>0 

Therefore, the generating function satisfies 

A{X ^ ] " 1 = a(x+^)A(X,T) + 2(l-a)A(X,T)-TA(X,T), 



which solution is given by 



A(X,T) 



(1-T) 2 



1- a- 



T fl-X 



I 1-T 



Now, we can evaluate the power series expansion of the generating function A. Using the 
following power series expansion, valid for |u| < 1, 



1 



p + n 



n>0 V y 



(1 - u)p+ 1 

and the properties of binomial coefficients, we successively have 



A(X,T) 



fl - T) 2 ^ a? ' 



T) 2 ^— ' X" \ 1 - T 

E a " E f 2n ) (-i)'^"" E f 2 " + 1 + '' V/ 



n>0 i=0 



fc>0 



V 2n + 1 



n—k 



E^E^Ef^-V^'W-D"^" 



fc>0 n=|i| 



n + i/V 2n+l 



Finally, by identification of the two power scries expansions of the generating function A, we have 
for all i, fc, < |«| < k, 



(35) 



1=1*1 



n + i/ V 2n + 1 



For ? = 0, sharp eyes may recognize that A§ = X)ri=o — 2a) where the P„'s are Legendre 
polynomials. Fejer showed in [25) the nonnegativity of the sum of Legendre polynomials when the 
argument is in [—1, 1], which is satisfied here since we consider < a < 1. More generally, we have 
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\!? = a' 1 ' Xm=o -Pn 2 ' 1 '' ^! — 2a) where the P^^'s are Jacobi polynomials. Askey and Gasper 
generalized in [2127] Fejer's result for j3 > and a + j3 > —2. See also pQ, pages 314 and 384. 

Indeed, from the definition of Jacobi polynomials, for all a,f3 > —1, for all n 6 N, for all 
as e [-1,1], 



we have, for all i, k, < i < k, 



n + a\ (n + f3\ fx + 1 x '' 
P )\ n ~P. 



x - 1 



k—i n 
k n—i p 

EEE 

n—i p—0 q=0 
k n n 

EEE 

n—i p—i q—p 
k n k 

EEE 

n—i p—i q—n 



n + 2i 
P 



n 

n — p 



n + i\ In — i\ p 



P 



P 



n + 1 \ I n — i \ In — p 
n — p) \n — p J \n — q 

q + i\fq-i\fq-p 
p + ij \p — i) \n— p 



(l-a) p (-a) n ~ p 

f_]\n-p+q+i a n-p+q 

{-l) q+l a q 



{-l) n+l a n 



We have successively shifted n by i, replaced n — p by p, then shifted q by p. To obtain the last 
equality, notice that the previous triple sum is actually taken over {(n,p, q) £ N 3 /i < p < q < 
n < k}, hence we can take q in [i..k], p in [i..q], n in [^../c], and then switch notations n and g, 
and use the symmetry of binomial coefficients. Identifying with the expression of (35 ), we are led 
to prove, for all i, n, fc, < i < n < fc, 



(36) 



EE 



p + ij \P — ij \n — p 



2n 

n + i 



n + k + 1 
2n + l 



p—t q—n 

Suppose we have the following identity, for all i, n, fc, < i < n < k 7 



(37) 



E 



k + i\ fk — i\ (k — p 



p + i J \p — ij \n — p 



2n 
n + i 



k + n 
2n 



Then, we would have 

n k 



EE 

p—i q—n 



q + i\fq-i\fq-p 

p + ij \p — ij \n — p 



E 



2n 
n + i 



q + n 
2n 



2n 
n + i 



k+n 

E 

q=2n 



2n 



2n 

n + i 



k + n + 1 
2n + l 



The last equality comes directly from the recurrence relation for the binomial coefficients (column- 
sum property of Pascal's triangle) . 

Proving identity (37) is a bit more technical. The hypergeometric nature of its terms makes it 
a good candidate for Zeilberger's algorithm, a.k.a. the method of creative telescoping, see [3"8"ll3"5] . 
Let us introduce some new notations, for all i, n, k, < i < n < k, 



F(i,n,k;p) = 



k + i 



k — p 



v p + i J \p — I j \n — p 
f(i,n,k) = ^2F(i,n,k;p), 



p 



g(i,n,k) 



■In 

n + i 



k + n 
2n 
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Note that F vanishes when p is outside the interval [i..n]. Thus, identity (37) now writes / = g. 

Let / (resp. N, K, and P) be the forward shift operator in i (resp. in n, k, and p). E.g., 
If(i, n, k) = f(i + 1, n, k). We first assume that the function / satisfies the following first order 
recurrence relations, for all i,n,k, < i < n < k, 

(38) (n + l+i)If = (n-i)f, 

(39) (n+l + i)(n + l-i)Nf = (k + 1 + n){k - n)f, 

(40) (k + 1 - n)Kf = (k + l + n)f. 

Then, it is easy to show that the function g satisfies exactly the same first order recurrence 
relations, and since /(0,0,0) = g(0,0,0) = 1, we have / = g. Indeed, simply using the symmetry 
and absorption properties of binomial coefficients, we have, for all i, n, fc, < i < n < k, 

g(i + l,n,k) L+i+i) n-i 



g(i, n+l,k) _ (n+t+i) ( in+2) _ k + l + n Cntl) (sn+i) (k + l+n)(k-n) 



g(i,n,k) ( n 2 ; i 4 )( fe +") " + ! + ^ („+■) ( k L n ) (n + l + i)(n+l-<)' 

g(i,n,k + l) _ ( fc+ 2 1 + n ) _ k + l + n 



g(i,n,k) 



Finding the first order recurrence relations for /, i.e. equations (38), (39), and (40), is the 
job of the method of creative telescoping. Actually, Zeilberger's algorithm provides recurrence 
relations for the hypergeometric summand F of the form 



m=0 



b LTn L m F={P-l){R l F) 



where coefficients b^ m are polynomials independent of p, and Ri is a rational function. There are 
actually one such recurrence relation per free variable I (here i, n, and k), and L is the generic 
forward shift operator in the generic free variable I. Thus, since coefficients 6; >m do not depend 
on p, when summing over p, the right-hand terms telescope, and we can deduce similar recurrence 
relations for the sum / 

m 

h, m L m f = 0. 

m=Q 

Note that although those recurrence relations are difficult to obtain, and even to check on the 
sum /, they are easy to check on the summand F (since there is no more sum over p). On has 
just to check the simpler equations with rational expressions 

m jj n PF 
(41) b lfl +J2b hm ^^ = PR l —-R l . 

m—l 

In the present case, Zeilberger's algorithm provides first order recurrence relations with 

(l + 2i)(p-i) 

bi,o=n-t, Oi,i = —{n + 1 + 1), R4 = : , 

bn.a = {k+l + n)(k - n), 6 n ,i = -{n+l + i)(n + 1 - *), R n = ^ ~ " + ' ~ ' ' 



n + 1 — p 

6 fe , = fc + l + n, b kA =-(k+l-n), R k ^ {l ' + ' ){l, ~' ] 
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And, simply using the symmetry and absorption properties of binomial coefficients, we successively 
have, for I = i, 

(k + l + i) (p - i) 



(41) o (n-i)-(n + l + i) , . . , 
H — ' v ; v ' (p + 1 + i) (k - i) 

_ (l + 2i)(p+l-i) (k-p) (k-p) (n-p) _ (l + 2i)(p-i) 

(k-i) (p + l + i) (p + l-i) (k- p) (k - i) 

^ (n- i)(k - i)(p + l + i)-(n + l + i)(k + 1 + i)(p - i) 

= (l + 2i)(k-p)(n-p) - (l + 2i)(p-i)(p + l + i) 

«*■ [(2n + l)(2fe + 1) - (2n + l)(2p + 1) - (2fc + l)(2p + 1) + (1 + 2i) 2 } 

= (1 + 2i) [(fc-p)(n-p) - (p-i)(p + l + i)} 

(l + 2i) 2 (2p+l) 2 . ... i .. 
«■ , ~ , =-(p-t)(p+l + »), 



for ^ = n, 



(41) <^> (fe+l + n)(fc-n)-(n+l + i)(n + l-i)- 



(n + 1 - p) 

(A; — n)(p + 1 + i)(p + 1 — i) (k — p) (k — p) (n — p) (k — n)(p + i)(p — i) 
(n — p) (p + 1 + i) (p + 1 — i) (k — p) (n + l — p) 

<^> (n + 1 + k)(n + 1 - p) - (n + 1 + i)(n + 1 — i) = (k - p)(n + 1 - p) — (p + i)(p — i) 
<=> (n + l)(fc -p) - kp + i 2 = (k-p)(n + 1) - kp + i 2 , 



and for / = fc, 



(41| ^ (k + l+n)- (k + l-n 



(k + l + i) (k + l-i) (k + l-p) 



(k + l-p) (k+l-p) (k + l-n) 
_ (p+ 1 + i)(p + 1 — i) (k—p) (k — p) (n-p) (p + i)(p — i) 
(k-p) (p + 1 + i) (p + 1 - i) (k - p) (k + 1 - p) 

(k + 1 + n)(fc + l-p)-(fc + l + i)(fc + 1 - «) = (n-p)(k + 1 - p) - (p + i)(p - i) 
O (fc + l)(n — p) — np + i 2 = (n — p)(fe + 1) — rap + i 2 



which are all true. 
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